AITopics | informativeness and representativeness

Not All Out-of-Distribution Data Are Harmful to Open-Set Active Learning

Neural Information Processing SystemsDec-24-2025, 09:33:44 GMT

Active learning (AL) methods have been proven to be an effective way to reduce the labeling effort by intelligently selecting valuable instances for annotation. Despite their great success with in-distribution (ID) scenarios, AL methods suffer from performance degradation in many real-world applications because out-of-distribution (OOD) instances are always inevitably contained in unlabeled data, which may lead to inefficient sampling. Therefore, several attempts have been explored open-set AL by strategically selecting pure ID instances while filtering OOD instances. However, concentrating solely on selecting pseudo-ID instances may cause the training constraint of the ID classifier and OOD detector. To address this issue, we propose a simple yet effective sampling scheme, Progressive Active Learning (PAL), which employs a progressive sampling mechanism to leverage the active selection of valuable OOD instances. The proposed PAL measures unlabeled instances by synergistically evaluating instances' informativeness and representativeness, and thus it can balance the pseudo-ID and pseudo-OOD instances in each round to enhance both the capacity of the ID classifier and the OOD detector.

name change, open-set active learning, out-of-distribution data, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

6c5f877b2d78e093860ce9715e251dec-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 05:16:09 GMT

eigenvalue, learning, node, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New Jersey (0.04)

Genre:

Instructional Material (0.67)
Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Education (0.92)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Not All Out-of-Distribution Data Are Harmful to Open-Set Active Learning

Neural Information Processing SystemsOct-10-2024, 19:55:53 GMT

Active learning (AL) methods have been proven to be an effective way to reduce the labeling effort by intelligently selecting valuable instances for annotation. Despite their great success with in-distribution (ID) scenarios, AL methods suffer from performance degradation in many real-world applications because out-of-distribution (OOD) instances are always inevitably contained in unlabeled data, which may lead to inefficient sampling. Therefore, several attempts have been explored open-set AL by strategically selecting pure ID instances while filtering OOD instances. However, concentrating solely on selecting pseudo-ID instances may cause the training constraint of the ID classifier and OOD detector. To address this issue, we propose a simple yet effective sampling scheme, Progressive Active Learning (PAL), which employs a progressive sampling mechanism to leverage the active selection of valuable OOD instances. The proposed PAL measures unlabeled instances by synergistically evaluating instances' informativeness and representativeness, and thus it can balance the pseudo-ID and pseudo-OOD instances in each round to enhance both the capacity of the ID classifier and the OOD detector.

informativeness and representativeness, open-set active learning, out-of-distribution data, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Robust Offline Active Learning on Graphs

Wu, Yuanchen, Yuan, Yubai

arXiv.org Artificial IntelligenceAug-15-2024

We consider the problem of active learning on graphs, which has crucial applications in many real-world networks where labeling node responses is expensive. In this paper, we propose an offline active learning method that selects nodes to query by explicitly incorporating information from both the network structure and node covariates. Building on graph signal recovery theories and the random spectral sparsification technique, the proposed method adopts a two-stage biased sampling strategy that takes both informativeness and representativeness into consideration for node querying. Informativeness refers to the complexity of graph signals that are learnable from the responses of queried nodes, while representativeness refers to the capacity of queried nodes to control generalization errors given noisy node-level information. We establish a theoretical relationship between generalization error and the number of nodes selected by the proposed method. Our theoretical results demonstrate the trade-off between informativeness and representativeness in active learning. Extensive numerical experiments show that the proposed method is competitive with existing graph-based active learning methods, especially when node covariates and responses contain noises. Additionally, the proposed method is applicable to both regression and classification tasks on graphs.

covariate, learning, node, (15 more...)

arXiv.org Artificial Intelligence

2408.07941

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New Jersey (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

ImitAL: Learned Active Learning Strategy on Synthetic Data

Gonsior, Julius, Thiele, Maik, Lehner, Wolfgang

arXiv.org Artificial IntelligenceAug-24-2022

Active Learning (AL) is a well-known standard method for efficiently obtaining annotated data by first labeling the samples that contain the most information based on a query strategy. In the past, a large variety of such query strategies has been proposed, with each generation of new strategies increasing the runtime and adding more complexity. However, to the best of our our knowledge, none of these strategies excels consistently over a large number of datasets from different application domains. Basically, most of the the existing AL strategies are a combination of the two simple heuristics informativeness and representativeness, and the big differences lie in the combination of the often conflicting heuristics. Within this paper, we propose ImitAL, a domain-independent novel query strategy, which encodes AL as a learning-to-rank problem and learns an optimal combination between both heuristics. We train ImitAL on large-scale simulated AL runs on purely synthetic datasets. To show that ImitAL was successfully trained, we perform an extensive evaluation comparing our strategy on 13 different datasets, from a wide range of domains, with 7 other query strategies.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-18840-4_4

2208.11636

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Saxony > Dresden (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(3 more...)

Genre:

Research Report (1.00)
Overview (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Active Learning by Querying Informative and Representative Examples

Huang, Sheng-jun, Jin, Rong, Zhou, Zhi-hua

Neural Information Processing SystemsDec-31-2010

Most active learning approaches select either informative or representative unlabeled instances to query their labels. Although several active learning algorithms have been proposed to combine the two criterions for query selection, they are usually ad hoc in finding unlabeled instances that are both informative and representative. We address this challenge by a principled approach, termed QUIRE, based on the min-max view of active learning. The proposed approach provides a systematic way for measuring and combining the informativeness and representativeness of an instance. Extensive experimental results show that the proposed QUIRE approach outperforms several state-of -the-art active learning approaches.

active learning, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: